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Abstract - We show that modularity maximization with the resolution parameter offers a unifying 
framework of graph partitioning. In this framework, we demonstrate that the spectral method 
exhibits universal detectability, irrespective of the value of the resolution parameter, as long as 
the graph is partitioned. Furthermore, we show that when the resolution parameter is sufficiently 
small, a first-order phase transition occurs, resulting in the graph being unpartitioned. 


c/) Introduction. — Graph partitioning is often analyzed 
, ^ p s a fundamental problem to understand the performance 
of community detection in complex networks. Graph par- 
COtitioning was originally an optimization problem; for a 
^ given number of modules, the problem is to find the par- 
tition with the sparsest cut under the constraint that the 
size of the modules are exactly or nearly equal. When 
AQ graph partitioning is applied to a social network with a 
modular structure, for instance, the nodes identified as 
0>^ members of a module are expected to belong to the same 
social group. 

in 

To clarify the perspective of graph partitioning as com- 
^ raunity detection, let us consider the partitioning of a uni- 
• ^ form random graph (i.e., a random graph without planted 
block structures) as an example. While it is still mean- 
^ ingful to find the optimal partition for each instance when 
the problem is regarded purely as an optimization prob¬ 
lem, the result is hardly significant for community detec¬ 
tion; it should be statistically different from the results 
for uniform random graphs. Therefore, it is of significant 
importance to ascertain each algorithm’s performance. In¬ 
terestingly, even when we generate random graphs with a 
planted modular structure, which have higher edge den¬ 
sity within a module than between modules, the average 
performance of a partition may be indistinguishable from 
that of uniform random graphs as long as the modular 
structure is not sufficiently clear. This indistinguishable 
region is called the undetectable phase, while the region 
where the partition is positively correlated to the planted 
modules is called the detectable phase. The boundary is 
called the detectability threshold m- Since many real 


networks are sparse, we focus on the case of sparse graphs. 
That is, the average degree does not increase as the total 
number of nodes increases. 

Because the graph partitioning is usually formulated as 
a discrete optimization problem, which is computationally 
expensive, the spectral method [siiin] that solves for the 
continuous relaxation of the original problem is often used. 
Whereas the performance of partition generally depends 
on the choice of objective function to be optimized, it was 
shown in m that the spectral method for three popular 
objective functions, namely modularity, normalized cut, 
and log-likelihood of the degree-corrected stochastic block 
model reduce to an eigenvalue problem of the nor¬ 
malized Laplacian when an elliptical normalization is con¬ 
sidered as a constraint. 

In this paper, we first show that the above three ob¬ 
jective functions can be formulated in the framework of 
modularity maximization in the level of discrete optimiza¬ 
tion. While the modularity form of the log-likelihood of 
the degree-corrected stochastic block model was already 
derived in we show that the normalized cut can also 
be formulated in the same framework. We then conduct a 
detectability analysis of the spectral method of the mod¬ 
ularity with the spherical normalization constraint, i.e., 
the method with the modularity matrix. The detectabil¬ 
ity analysis of the spectral method with the normalized 
Laplacian for sparse graphs was performed in m- 

An important difference between graph partitioning and 
community detection lies in whether the number of mod¬ 
ules is given or to be estimated. While the graph is al¬ 
ways partitioned into a given number of modules when 
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the normalized Laplacian is used, the method with the 
modularity matrix has its own criterion that determines 
whether the graph should or should not be partitioned. 
We refer to the parameter region, where the graph is not 
partitioned, as the unpartitioned phase. To our knowledge, 
the detectability analysis was not concerned with this un¬ 
partitioned phase. Focusing on the bisection problem, our 
analysis here reveals the relation between the detectable 
phase, undetectable phase, and unpartitioned phase. 

Unifying framework of graph partitioning. We 

consider the bipartitioning of a graph G{V, E) with a node 
set V and an edge set E. We denote the number of nodes 
as N(= |U|) and the total degree, or the graph’s volume, 
as K(= 2 |i?|). The two subsets of nodes obtained by the 
partition and their total degree are denoted by Sr and Kr 
(r = 1 , 2 ), respectively. We indicate the set of edges that 
connect nodes in Si and S2 as E{Si, 5*2). 

The modularity with the resolution parameter, the ob¬ 
jective function to be maximized, is 

Q,(5i,52)=^ ^ (1) 

r i,jeSr 

where A is the adjacency matrix and Ci represents the de¬ 
gree of node i. The modularity function distinguishes the 
connectivity of the actual graph, Aij , and the correspond¬ 
ing value of its null model, CiCjjK, in each module. The 
resolution parameter 0 > 0 controls the balance between 
them. Although the choice of the null model is gener¬ 
ally arbitrary, as considered in most of the literature, we 
employ a random graph, the expected degree sequence of 
which is equal to that of the actual graph. Because we 
focus upon bisection in the present study, the modularity 
function Qg can be expressed using a spin representation 
Si = ±1 (i = 1 ,... ,N). Ignoring the constants irrelevant 
to the partition, we have 

Qe(s) = s^Bs = ^A-s, ( 2 ) 


where the vector c = (ci,..., cjv) has the degree of each 
node as its components and T indicates the transpose. 
The matrix B is called the modularity matrix, and 0 is 
usually set to unity when bisection is considered. As we 
show in what follows, this framework contains the nor¬ 
malized cut minimization as a special case. The method 
of maximum log-likelihood is also a special case of this 
framework that has a particular value of 0 [nj. A similar 
argument was presented in M- 

The objective function /Ncut of the normalized cut is 

\EiSi,S2)\ 

/Ncut(<hi, * 2 ) — A ———- [ 6 ) 

K1K2 

for bisection. We denote the minimum value of 
fNcut{Si, S2) as 0 *, i.e., for any partition, 


^^ \E{Si,S2)\ 

K1K2 


> 0 *. 


(4) 


By using the relations |A(5i,S'2)| = (AT —s^As)/ 4 , 
Ki = (K + c^s)/ 2 , and K2 = {K — s)/ 2 , we can recast 

(01) as 


s^As- 0 *^^^ <K{l- 0 *), ( 5 ) 

K 

where we excluded the unpartitioned case, which is sin¬ 
gular in ( 0 ]). Note that the right-hand side of m does 
not depend on the partition. The equality holds when the 
left-hand side is maximized, which only occurs when the 
optimum partition is achieved unless nontrivial degenera¬ 
cies exist; although the unpartitioned case also achieves 
the equality in (O, this choice is excluded. Therefore, if 
the optimum value 0 * is known, minimizing the normal¬ 
ized cut is equivalent to maximizing modularity with the 
resolution parameter, i.e., 

min /Ncut(*51, 5*2) = max (5e*(5i,52). (6) 

{Sl,S2} {Sl,S2} 

Because we do not know the minimum value of the nor¬ 
malized cut 0 * a priori, the above argument is completely 
formal. However, Eq. ( 0 ]) denies the possibility that the 
optimum partition in the sense of the normalized cut may 
be different from the optimum partition in the sense of 
modularity. 

We now consider the spectral method for ( 0 |). As in 
m, we relax the optimization of the spin variables s to a 
continuous vector x with the spherical normalization con¬ 
dition \x\’^ = N. This leads to the eigenvalue problem of 
the modularity matrix B, and we determine the partition 
based on the signs of the leading eigenvector elements. 
That is, each element in the leading eigenvector corre¬ 
sponds to the weight of a vertex, and we identify the set 
of vertices with the weights of the same sign as a module. 
The unpartitioned phase is the case in which every node 
has the same sign of weight. Note that the 1 -vector, the 
vector in which all elements are equal to unity, is not an 
eigenvector when 0 ^ 1 . However, as we show, the leading 
eigenvector is orthogonal to the 1-vector in the detectable 
region. 

Largest eigenvalue of the modularity matrix. — 

We analyze the performance of the spectral method for an 
ensemble of random graphs with a planted 2-block struc¬ 
ture. We denote the node sets of planted modules as Vi 
and V2, where \Vi\ = piN and |Vi| = P2N {p2 = 1 — Pi). 
We impose a constraint that the number of edges between 
blocks are yiV. The rest of the edges are placed randomly 
within each module so that every node satisfies a given 
degree distribution {bt}, where bt represents the fraction 
of nodes with degree c*. The average degree is denoted by 
c (= btCt). As we have 7 = cpiP2 for a uniform random 
graph, it is natural to consider T = 1 — 'yjcpip2. T = 1 
when modules are completely disconnected, and T = 0 for 
a uniform random graph. 

Our goal is to evaluate the ensemble averages of the 
leading eigenvalues and their eigenvector distributions as 
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functions of 6 and F. This allows us to measure the cor¬ 
relation between the partition obtained by the spectral 
method {S'i,S' 2 } and the planted partition {Vi,V 2 }. As 
in the previous works plITS] . the largest eigenvalue of the 
modularity matrix B can be calculated as 

|3-^oo p 

where 

Z{P\B) = J - N). (8) 

For the average largest eigenvalue, the replica trick 
yields 

where we denote the ensemble average over random graphs 
as • ]b- We consider the limit N ^ oo and evaluate (0 
using the saddle-point method with the replica-symmetric 
ansatz. After a calculation analogous to that in [nna, 
we arrive at a saddle-point expression of [Ai]^ (see Ap¬ 
pendix for the specific form of [Ai]^). It is composed of 
distributions of the order parameter functions qr{A,H) 
(r = 1,2), which appear in {P\B )]their conjugate 
distributions qr{A,H)] and auxiliary variables (j) and D, 
which originate from the normalization constraint and the 
penalty term [cAx)^ jK. 

Solving for the saddle point in the entire function space 
is, however, not feasible analytically or numerically. Thus, 
we restrict the possibility of distributions qj.{A,H) and 
qr{A,H) to simple forms of q{A) = 5{A — a) and q{A) = 
5{A — a). While such distributions actually provide the 
exact saddle point for random regular graphs, they are ap¬ 
proximations in general; this is called the effective medium 
approximation (EMA). Under this restriction, we can de¬ 
termine the average first eigenvalue [Ai]^ and the saddle- 
point conditions in analytic forms by using the functions 


Rn{(j),a)-^ 

^ (p- cta 

( 10 ) 


( 11 ) 


and the moments mnr = / dHqr{H)H^ and mnr = 
/ dH qr{H)H'^. The average first eigenvalue [Ai]g be¬ 


comes 

[Ai]^ = extr^f/) -I- 2DD — 

"q 

-r ((^2r) + 2 {mirihir) + {rh2r)) 

a — a 

+ 

7 2 

-5-7 {mil - miz) 

— 1 

-f R 2 { 4 >, d) - 2D {rhir) + (rhf^)'j 

+ Ri{(j}, d) {{m2r) - {ml^})'^, (12) 

where (Xnr) = J2rPr^nr- The extremum conditions 
elucidate the appearance of each phase. While we have 
mil A mi 2 with {rhif) = 0 in the detectable phase, the 
condition m\i = rh \2 = 0 is satisfied in the undetectable 
phase. The transition occurs when p and d satisfy 

o /j, c(l -f a^) 

5^2(0, q) = (13) 

In the detectable phase, (j) and d are determined using the 
following extremum conditions: 

Ai(<^,a) = -^, (14) 

1 — 

R2 (</>, a) = (“ + f) ’ (^^(^ 

while they are constant in the undetectable phase. In both 
phases, we have D = 0, and the average first eigenvalue is 

[Ai]b = A (16) 

Note that (fT51) does not contain the resolution parameter 
0] therefore, the detectability threshold is universal with 
respect to 9. 

There also exists a solution with D 0 and rhu = 
mi 2 A 0- This solution indicates the unpartitioned phase, 
and it is observed when the corresponding hrst eigen¬ 
value becomes larger than that of the detectable and unde¬ 
tectable phases. In this phase, they are determined using 
([Till and 

R2{f>,d)= , I (17) 

The transition to the unpartitioned phase occurs when the 
values of (j) and d for two phases coincide. The average 
hrst eigenvalue [Ai]^ does not have F-dependence and is 
constant in this phase. As we observe in the examples 
below, while the transition from the detectable phase to 
the undetectable phase is continuous, the transition from 
the detectable phase to the unpartitioned phase is abrupt. 
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Fig. 1: (Color online) Left: (a) Phase diagram of the random 3-regular graph with respect to the resolution parameter 9 and 
the strength of the block structure Cin — Cout. The region Cin — Cout > 2c is invalid because we would have Cout < 0. Right: 
Fractions of correctly classified vertices and the eigenvalues of the modularity matrix in the random 3-regular graphs for (b) 
9 — 0.5, 1, 2 and (c) 0.05. The solid lines are the analytical results. The dots are the results of the numerical experiments, where 
we set N = 10^, pi = P 2 = 0.5, and the average was taken over 20 samples. 


Random regular graph. — In the case of random 
regular graphs, the above results are exact, and we can an¬ 
alytically solve for the physical quantities and the bound¬ 
aries of the phases. The detectable phase of random c- 
regular graph has 

[Ai]^ = (c-l)r + i, (18) 

= ( 19 ) 

The undetectable phase has 


[Ai]b = 2v/^, (20) 


and the detectability threshold is 


r = 


1 

i/c — 1 


( 21 ) 


This is equal to the case of the normalized Laplacian m- 
Finally, the boundary of the detectable phase and the un¬ 
partitioned phase is 


r 


un 


c(l -e) + v'c2(1-6»)2-4(c-1) 
2 (c-l) 


( 22 ) 


This is a monotonically decreasing function that is mini¬ 
mum when 


= 1 - 




(23) 


The corresponding value of T coincides with the de¬ 
tectability threshold. Note that when the graph is reg¬ 
ular, the 1 -vector is the leading eigenvector in the unpar¬ 
titioned phase. Its eigenvalue is c(l — 9) and is equal to 


the eigenvalue of the undetectable phase at dmax- Hence, 
the region where both 9 and T are small is the unparti¬ 
tioned phase. Consequently, we obtain the phase diagram 
shown in Fig. [TJa). Following the literature, we employed 
Cin —Cout, instead of F, to indicate the strength of the block 
structure; in the case of equal-size blocks pi = P 2 = 0.5, 
Cin — Cout is twice the difference between the average de¬ 
gree within a block and between blocks (see [T3] for the 
relation between 7 and cin — Cout) ■ 

The fractions of correctly classified vertices and the av¬ 
erage first eigenvalues [Ai]^ are plotted in Figs.djb) and 
[TJc). To draw the solid lines, we further approximated 
that the distribution of the eigenvector elements is Gaus¬ 
sian. We can confirm a universal detectability curve for 
various values of sufficiently large 9 and an abrupt transi¬ 
tion between the detectable and unpartitioned phases for 
a small value of 9. 

Recall that in the case of the normalized cut, the reso¬ 
lution parameter 9 is the optimum value of the objective 
function itself. Although the exact value of 9 is not known, 
it is bounded by the second-smallest eigenvalue of the nor¬ 
malized Laplacian m using the Cheeger inequality m- 
The dashed line in Fig. [TJa) indicates the lower bound 
of 9, i.e., one half of the second-smallest eigenvalue of the 
normalized Laplacian, while the upper bound is very large. 

Stochastic block model. — Although there are 
many variants of the stochastic block model, we consider 
the most fundamental model. Pairs of nodes within the 
same module and between different modules are connected 
with probabilities Pm and Pout (pin > Pout), respectively. 
That is, the nodes within the same module are more 
densely connected than the nodes in different modules. 
Because we are focusing on sparse graphs, we set both 
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Fig. 2: (Color online) Behaviors of the spectral method for the stochastic block model with c = 6. Left: Fractions of correctly 
classified vertices, average first eigenvalues [Ai]^, and IPR for (a) 6 — 0.5,1, 2, and (b) 0 = 0.25. The dots are the results of the 
numerical experiments, where we set N = 2 x 10"*, pi = P 2 ~ 0.5, and the average was taken over 20 samples. The solid line for 
[Ai]g is the result of EMA. Right: (c) Phase diagram of the fraction of correctly classified vertices with respect to the resolution 
parameter 6 and the strength of the block structure Cin — Cout ■ Each point of the density plot indicates the average value over 20 
samples. The dashed line is the lower bound of 8 estimated using the Cheeger inequality, where we used the EMA solution m 
for the second-smallest eigenvalue of the normalized Laplacian. The white lines indicate the results of EMA. (d) Phase diagram 
of the detectability threshold with respect to the average degree c. The dashed line, solid line, and connected dots represent 
the estimates of the detectability thresholds with the dense approximation [5], threshold of the normalized Laplacian and 
threshold of the modularity matrix with EMA. Although they are not shown, we also confirmed the universal behavior with 
respect to 6 for several unequal block sizes. 


Cin = PinN and Cout = PoutN to be 0(1). This model has 
the Poisson degree distribution, and therefore, our EMA- 
based treatment no longer offers the exact result. Because 
no bound exists for the maximum of a node degree, we 
need to rely on the numerical estimate of the formal so¬ 
lution by truncating the infinite summations of (^, a) 
and Sn{4>, a)- The results of our replica analysis and those 
of the corresponding numerical experiments are compared 
in Figs. lla)-(c). The comparison indicates that our es¬ 
timates offer very accurate predictions. The average first 
eigenvalue [Ai]^ is fairly large even in the undetectable 
phase, which is consistent with m- Note that the degree 
of eigenvector localization, which we measured by the in¬ 
verse participation ratio (IPR), increases gradually around 
the detectability threshold; as far as we explored, the value 
of IPR becomes significantly large only in the undetectable 
phase (see [T3j for the definition of IPR). Although it is 
difficult to prove whether the localized eigenvector is ab¬ 
sent in the detectable region, its influence seems negligible. 
This is consistent with what we observed empirically m- 


Summary and discussion. — In summary, we 
showed that the modularity maximization with the res¬ 
olution parameter 6 offers a unifying framework of graph 
partitioning and analyzed the detectability of the spec¬ 
tral method of the unifying framework. Our phase dia¬ 
gram shows that when the resolution parameter 6 is suf¬ 
ficiently small, the unpartitioned phase appears before 
the detectability threshold because the block structure is 
weakened; that is, even when the block structure is sta¬ 
tistically significant in the sense of the Bayesian inference, 
there exist cases where the graph has no significant struc¬ 
ture in the sense of an objective function. Otherwise, the 
detectability threshold is universal irrespective of 9. This 
behavior occurs probably because the graph is assumed 
to be infinitely large in our analysis; because the order 
of the penalty factor {cJxY jK is smaller than that of 
Ax in the detectable phase, the value of 9 does not 
affect the resulting performance. Our results imply that, 
whereas the gap between the detectability threshold of the 
Bayesian inference and the spectral method was mainly 
due to the eigenvector localization when the normalized 
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Laplacian was used m, it is mainly because of the differ¬ 
ence in the detectability threshold itself in the case of the 
modularity matrix. 
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Appendix: Derivation of the saddle-point ex¬ 
pression. — From ([5]), the average of the largest eigen¬ 
value [Ai]^ can be obtained by evaluating the moment 
of the “partition function” as an analytic func¬ 
tion of n. Assuming n as a positive integer, we have 





dXadnaS{\Xa\‘^ - N) S (fla - 



X exp 


/ 

I 2^ 




(A.l) 


where a is the replica index and the last factor represents 
the ensemble average over graphs with proper edge con¬ 
straints. We now introduce the following order parameter 
functions for the block r. 

1 "■ 

Qr{lJ‘) = ^ ^ ^ (^- 2 ) 

i&Vr a=l 


where Zi is an auxiliary variable that originates from the 
edge constraint. Assuming that (IA.2I) becomes invariant 
under any permutation of replica indices a at the domi¬ 
nant saddle point, we then express these order-parameter 
functions and their conjugates as Gaussian mixtures as 
follows: 


Qrifi) = / dAdHqr{A,H) 


(iA 

I 27r 


PA^ f H 


X exp 


Qr{^l) = A dAdH qr{A,H) 


X exp 


n 


a—1 


(A.3) 


(A.4) 


where {cpr — ^)/Np^ and q^q^ = c, as derived in 

[13]. These forms yield, in the limit A —>■ oo. 


[Ai]^ = 2 extr _— —fl 


qr-iqr ifp I 


-E 


CPr 


dAdHdA'dH' qr{A, H)qr{A, H) 


{H -b Hf 


A-A 


A 


1 

m 


E/ E/ / n (A^s^^gdriAg, Hg 


r,t ieVr,t ff=l 


(ctn-ElliHgX 

+ - ^2 [^Pr — l)SrB + 7(1 — <5rs)] / dAdH f dA'dH' 

r^s 

XqriA, H)qsiA',H') 


/ A'H^ + 2HH' + AH'^ 



AA'-l 

A 

A' Jj 


(A.5) 


In (IA.5I) . Cl is the conjugate of 17 in (lA.ll) and p is another 
auxiliary variable that originates from the normalization 
constraint \xp = N. 
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